Goto

Collaborating Authors

 San José




PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations

Holzschuh, Benjamin, Liu, Qiang, Kohl, Georg, Thuerey, Nils

arXiv.org Artificial Intelligence

We introduce PDE-Transformer, an improved transformer-based architecture for surrogate modeling of physics simulations on regular grids. We combine recent architectural improvements of diffusion transformers with adjustments specific for large-scale simulations to yield a more scalable and versatile general-purpose transformer architecture, which can be used as the backbone for building large-scale foundation models in physical sciences. We demonstrate that our proposed architecture outperforms state-of-the-art transformer architectures for computer vision on a large dataset of 16 different types of PDEs. We propose to embed different physical channels individually as spatio-temporal tokens, which interact via channel-wise self-attention. This helps to maintain a consistent information density of tokens when learning multiple types of PDEs simultaneously. We demonstrate that our pre-trained models achieve improved performance on several challenging downstream tasks compared to training from scratch and also beat other foundation model architectures for physics simulations.


Iffy-Or-Not: Extending the Web to Support the Critical Evaluation of Fallacious Texts

Lim, Gionnieve, Kim, Juho, Perrault, Simon T.

arXiv.org Artificial Intelligence

Social platforms have expanded opportunities for deliberation with the comments being used to inform one's opinion. However, using such information to form opinions is challenged by unsubstantiated or false content. To enhance the quality of opinion formation and potentially confer resistance to misinformation, we developed Iffy-Or-Not (ION), a browser extension that seeks to invoke critical thinking when reading texts. With three features guided by argumentation theory, ION highlights fallacious content, suggests diverse queries to probe them with, and offers deeper questions to consider and chat with others about. From a user study (N=18), we found that ION encourages users to be more attentive to the content, suggests queries that align with or are preferable to their own, and poses thought-provoking questions that expands their perspectives. However, some participants expressed aversion to ION due to misalignments with their information goals and thinking predispositions. Potential backfiring effects with ION are discussed.


Local Differences, Global Lessons: Insights from Organisation Policies for International Legislation

Kaffee, Lucie-Aimée, Atanasova, Pepa, Rogers, Anna

arXiv.org Artificial Intelligence

The rapid adoption of AI across diverse domains has led to the development of organisational guidelines that vary significantly, even within the same sector. This paper examines AI policies in two domains, news organisations and universities, to understand how bottom-up governance approaches shape AI usage and oversight. By analysing these policies, we identify key areas of convergence and divergence in how organisations address risks such as bias, privacy, misinformation, and accountability. We then explore the implications of these findings for international AI legislation, particularly the EU AI Act, highlighting gaps where practical policy insights could inform regulatory refinements. Our analysis reveals that organisational policies often address issues such as AI literacy, disclosure practices, and environmental impact, areas that are underdeveloped in existing international frameworks. We argue that lessons from domain-specific AI policies can contribute to more adaptive and effective AI governance at the global level. This study provides actionable recommendations for policymakers seeking to bridge the gap between local AI practices and international regulations.


"It Felt Like I Was Left in the Dark": Exploring Information Needs and Design Opportunities for Family Caregivers of Older Adult Patients in Critical Care Settings

Fu, Shihan, Yao, Bingsheng, Desai, Smit, Hu, Yuqi, Sun, Yuling, Stonbraker, Samantha, Gao, Yanjun, Goldberg, Elizabeth M., Wang, Dakuo

arXiv.org Artificial Intelligence

Older adult patients constitute a rapidly growing subgroup of Intensive Care Unit (ICU) patients. In these situations, their family caregivers are expected to represent the unconscious patients to access and interpret patients' medical information. However, caregivers currently have to rely on overloaded clinicians for information updates and typically lack the health literacy to understand complex medical information. Our project aims to explore the information needs of caregivers of ICU older adult patients, from which we can propose design opportunities to guide future AI systems. The project begins with formative interviews with 11 caregivers to identify their challenges in accessing and interpreting medical information; From these findings, we then synthesize design requirements and propose an AI system prototype to cope with caregivers' challenges. The system prototype has two key features: a timeline visualization to show the AI extracted and summarized older adult patients' key medical events; and an LLM-based chatbot to provide context-aware informational support. We conclude our paper by reporting on the follow-up user evaluation of the system and discussing future AI-based systems for ICU caregivers of older adults.


Adaptive Genetic Algorithms for Pulse-Level Quantum Error Mitigation

Aguilar-Calvo, William, Núñez-Corrales, Santiago

arXiv.org Artificial Intelligence

Noise remains a fundamental challenge in quantum computing, significantly affecting pulse fidelity and overall circuit performance. This paper introduces an adaptive algorithm for pulse-level quantum error mitigation, designed to enhance fidelity by dynamically responding to noise conditions without modifying circuit gates. By targeting pulse parameters directly, this method reduces the impact of various noise sources, improving algorithm resilience in quantum circuits. We show the latter by applying our protocol to Grover's and Deutsch-Jozsa algorithms. Experimental results show that this pulse-level strategy provides a flexible and efficient solution for increasing fidelity during the noisy execution of quantum circuits. Our work contributes to advancements in error mitigation techniques, essential for robust quantum computing.


Zero-shot Shark Tracking and Biometrics from Aerial Imagery

Lalgudi, Chinmay K, Leone, Mark E, Clark, Jaden V, Madrigal-Mora, Sergio, Espinoza, Mario

arXiv.org Artificial Intelligence

The recent widespread adoption of drones for studying marine animals provides opportunities for deriving biological information from aerial imagery. The large scale of imagery data acquired from drones is well suited for machine learning (ML) analysis. Development of ML models for analyzing marine animal aerial imagery has followed the classical paradigm of training, testing, and deploying a new model for each dataset, requiring significant time, human effort, and ML expertise. We introduce Frame Level ALIgment and tRacking (FLAIR), which leverages the video understanding of Segment Anything Model 2 (SAM2) and the vision-language capabilities of Contrastive Language-Image Pre-training (CLIP). FLAIR takes a drone video as input and outputs segmentation masks of the species of interest across the video. Notably, FLAIR leverages a zero-shot approach, eliminating the need for labeled data, training a new model, or fine-tuning an existing model to generalize to other species. With a dataset of 18,000 drone images of Pacific nurse sharks, we trained state-of-the-art object detection models to compare against FLAIR. We show that FLAIR massively outperforms these object detectors and performs competitively against two human-in-the-loop methods for prompting SAM2, achieving a Dice score of 0.81. FLAIR readily generalizes to other shark species without additional human effort and can be combined with novel heuristics to automatically extract relevant information including length and tailbeat frequency. FLAIR has significant potential to accelerate aerial imagery analysis workflows, requiring markedly less human effort and expertise than traditional machine learning workflows, while achieving superior accuracy. By reducing the effort required for aerial imagery analysis, FLAIR allows scientists to spend more time interpreting results and deriving insights about marine ecosystems.


RealSeal: Revolutionizing Media Authentication with Real-Time Realism Scoring

Radharapu, Bhaktipriya, Krishna, Harish

arXiv.org Artificial Intelligence

The growing threat of deepfakes and manipulated media necessitates a radical rethinking of media authentication. Existing methods for watermarking synthetic data fall short, as they can be easily removed or altered, and current deepfake detection algorithms do not achieve perfect accuracy. Provenance techniques, which rely on metadata to verify content origin, fail to address the fundamental problem of staged or fake media. This paper introduces a groundbreaking paradigm shift in media authentication by advocating for the watermarking of real content at its source, as opposed to watermarking synthetic data. Our innovative approach employs multisensory inputs and machine learning to assess the realism of content in real-time and across different contexts. We propose embedding a robust realism score within the image metadata, fundamentally transforming how images are trusted and circulated. By combining established principles of human reasoning about reality, rooted in firmware and hardware security, with the sophisticated reasoning capabilities of contemporary machine learning systems, we develop a holistic approach that analyzes information from multiple perspectives. This ambitious, blue sky approach represents a significant leap forward in the field, pushing the boundaries of media authenticity and trust. By embracing cutting-edge advancements in technology and interdisciplinary research, we aim to establish a new standard for verifying the authenticity of digital media.


Prosody as a Teaching Signal for Agent Learning: Exploratory Studies and Algorithmic Implications

Knierim, Matilda, Jain, Sahil, Aydoğan, Murat Han, Mitra, Kenneth, Desai, Kush, Saran, Akanksha, Baraka, Kim

arXiv.org Artificial Intelligence

Agent learning from human interaction often relies on explicit signals, but implicit social cues, such as prosody in speech, could provide valuable information for more effective learning. This paper advocates for the integration of prosody as a teaching signal to enhance agent learning from human teachers. Through two exploratory studies--one examining voice feedback in an interactive reinforcement learning setup and the other analyzing restricted audio from human demonstrations in three Atari games--we demonstrate that prosody carries significant information about task dynamics. Our findings suggest that prosodic features, when coupled with explicit feedback, can enhance reinforcement learning outcomes. Moreover, we propose guidelines for prosody-sensitive algorithm design and discuss insights into teaching behavior. Our work underscores the potential of leveraging prosody as an implicit signal for more efficient agent learning, thus advancing human-agent interaction paradigms.